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Abstract 
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Q , In the kernel clustering problem we are given a large n x n positive semi-definite matrix A - (aij) 

D ' with Y/i i=i ^ij - ^nd a small kxk positive semi-definite matrix B = (bij). The goal is to find a partition 

^^ , S I,. . . ,S kof {!,. . .n) which maximizes the quantity 
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'; \ We study the computational complexity of this generic clustering problem which originates in the theory 

^ ' of machine learning. We design a constant factor polynomial time approximation algorithm for this 

problem, answering a question posed by Song, Smola, Gretton and Borgwardt. In some cases we manage 



to compute the sharp approximation threshold for this problem assuming the Unique Games Conjecture 
(UGC). In particular, when B is the 3x3 identity matrix the UGC hardness threshold of this problem is 
vQ ' exactly ^. We present and study a geometric conjecture of independent interest which we show would 

CnJ . imply that the UGC threshold when B is the kxk identity matrix is %^ ( 1 - j] for every A: > 3. 

I> ■ 1 Introduction 

O 
oo 

^ ' This paper is devoted to an investigation of the polynomial time approximability of a generic clustering 

problem which originates in the theory of machine learning. In doing so, we uncover a connection with a 
continuous geometric/analytic problem which is of independent interest. In ||23l Song, Smola, Gretton and 

/S \ Borgwardt introduced the following framework for kernel clustering problems. Assume that we are given a 

j^ ■ centered kernel, i.e. annxn positive semidefinite matrix A = (atj) with real entries such that Tj" j=i ^ij - 

(the assumption that the kernel is centered is a commonly used normalization in learning theory — see 1.22.1 
for more information on this topic). Such matrices arise, for example, as correlation matrices of random 
variables {Xi, . . . ,X„) that measure attributes of certain empirical data, i.e. Oij = BlXiXjl. We think of n as 
very large, and our goal is to "cluster" the matrix A to a much smaller ^ x ^ matrix in such a way that certain 
features could still be extracted from the clustered matrix. Formally, given a partition of {1, . . . ,«) into k 
sets S\,...,Sk, define the clustering of A with respect to this partition to be the kxk matrix, whose (/, jf^ 
entry is 

Z apq. (1) 

{p,q)eSiXSj 
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Let A(S I, . . .,S k) denote the k x k matrix given by ([T|l. In the kernel clustering problem, we are given a 
positive semidefinite k x k matrix B = (bij), and we wish to find the clustering A{S i, . . .,S t) = C = (cij) 
of A, which is most similar to B in the sense that Zf -^j c/jb/j, i.e its scalar product with B, is as large as 
possible. In other words, our goal is to compute the number (and the corresponding partition): 

k ( \ 



Clust(A|B) 



bij : {5i,. . . ,S^} is a partition of {1, . . .,«} 



max -; ^ ^ ^pq 

IU=1 V(p,?)e5;xSy 

k 
max \ 2. ^("^ 1' ■ • ■ ^^k)ii 'bij'- {5 1, . . . , Si^\ is a partition of {1, ... , n\ 

n 

max -; ^ aijba-(i)a-{j) '■ cr : {1, . . . , «} ^ {1, . . . , ^} ; 



(2) 



The flexibility in the above formulation of the kernel clustering problem is clearly in the choice of 
comparison matrix B, which allows us to enforce a wide-range of clustering criteria. Using the statistical 
interpretation of (aij) as a correlation matrix, we can think of the matrix B as encoding our belief/hypothesis 
that the empirical data has a certain structure, and the kernel clustering problem aims to efficiently expose 
this structure. 

Several explicit examples of useful "test matrices" B are discussed in f23\, including hierarchical clus- 
tering and clustering data on certain manifolds. We refer to [23] for additional information which illustrates 
the versatility of this general clustering problem, including its relation to the Hilbert Schmidt Independence 
Criterion (HSIC) and various experimental results. In [23] it was asked if there is a polynomial time approx- 
imation algorithm for computing Clust(A|B). Here we obtain a constant factor approximation algorithm for 
this problem, and prove some computational hardness of approximation results. 

Before stating our results in full generality we shall now present a few simple illustrative examples. If 
B - Ijiis the kxk identity matrix, then thinking once more of a,y as correlations E X;^y , our goal is to find 
a partition S i,. . . ,S toi {I,. . .,n] which maximizes the quantity 



/=1 p,qeSi 

i.e. we wish to cluster the variables so as to maximize the total intra-cluster correlations. As we shall 
see below, our results yield a polynomial time algorithm which approximates Clust(A|4) up to a factor of 
^ ( 1 - jj. In particular, when ^ = 3 we obtain a ^ approximation algorithm, and we show that assuming 
the Unique Games Conjecture (UGC) no polynomial time algorithm can achieve an approximation guarantee 
which is smaller than ■^. The Unique Games Conjecture was posed by Khot in [ 13 1, and it will be described 
momentarily. For the readers who are not familiar with this computational hypothesis and its remarkable 
applications to hardness of approximation, it suffices to say that this hardness result should be viewed as 
strong evidence that -^ is the sharp threshold below which no polynomial time algorithm can solve the 
kernel clustering problem when B = I3. Moreover, we conjecture that ^ ( 1 - ^ j is the sharp approximability 
threshold (assuming UGC) for Clust(A|4) for every k > 3. In this paper, we reduce this conjecture to a 
purely geometric/analytic conjecture, which we will describe in detail later, and prove some partial results 
about it. 



Another illustrative example of the kernel clustering problem is the case 

1 -I] 



^ = ■-1 1 



In this case, we clearly have 



Clust A 



(^ (-. 



1 -1 



^ , , = max -i > , : aijSiSj : ei, . . . , e„ e {-1, 1}} . (3) 



The optimization problem in dS]) is well known as the positive semi-definite Grothendieck problem and 
has several algorithmic applications (see EOl [181 |2j |5l). It has been shown by Rietz ||2(T| that the natural 
semidefinite relaxation of ^ has integrality gap | (see also Nesterov's work fW\). Our results imply 
that assuming the UGC | is the sharp approximation threshold for the positive-semidefinite Grothendieck 
problem. Note that without the assumption that A is positive semidefinite the natural semidefinite relaxation 
of ^ has integrality gap ©(log n). See 1(171 161 [Tl for more information, and IS for hardness results for this 
problem. 

We can also view the problem © as a generalization of the MaxCut problem. Indeed, let G = {V = 
{1, . . . , n), £■) be an n-vertex loop-free graph. For every vertex / e V let di denote its degree in G. Let A be 
the Laplacian of G, i.e. A is the nxn matrix given by 

( di if / ^ j, 
Qjj = ] -1 ifitj/\ ij € E, (4) 

[ ifii^jA ij ^ E. 

Then A is positive semi-definite since it is diagonally dominant. For every ei, . . . , £„ € {-1, 1) let 5 c Vhe 
the set S --{ieV : e,- ^ 1). Then: 

n n 

Y, aijSiSj = Yjdi- 2\E{S,S)\ - 2\E{V \S,V\S)\ + 2\E{S , V\S)\ 

- 2\E\ -2{\E\ - \EiS, V\S)\) + 2\E(S, V\S)\ = 4\E(S, V\S)\. (5) 



Hence 

ClustJA 



^^ ^^||-4MaxCut(G). 



Using Hastad's inapproximability result for MaxCut [11] it follows that if P 7^ NP there is no polynomial 
time algorithm which approximates (O up to a factor smaller than ||. 

Our algorithmic results. For a fixed positive semidefinite matrix B, the approximability threshold for the 
problem of computing Clust(A|B) depends on B. It is therefore of interest to study the performance of 
our algorithms in terms of the matrix B. We do obtain bounds which depend on B (which are probably 
suboptimal in general) — the precise statements are contained in Theorem l2.1l and Theorem l2.3l For the sake 
of simplicity, in the introduction we state bounds which are independent of B. We believe that the problem 
of computing the approximation threshold (perhaps under UGC) for each fixed B is an interesting problem 
which deserves further research. 

If A is centered, i.e. Yj" j=i ^ij - 0' then for every kxk positive semi-definite matrix B our algorithm 

achieves an approximation ratio of ;r(l - ^j. If, in addition, B is centered and spherical, i.e. Yq j=\ bij - 



and bji - 1 for all /, then our algorithm achieves an approximation ratio of ^ ( 1 - j). This ratio is also valid 
if B is the identity matrix, and as we mentioned above, we believe that this approximation guarantee cannot 
be improved assuming the UGC (and here we prove this conjecture for k = 3). When A is not necessarily 
centered (note that this case is of lesser interest in terms of the applications in machine learning) we obtain 
an algorithm which achieves an approximation ratio of 1 + ^^ (this is probably sub-optimal). All of our 
algorithms, which are described in Section[2l use semi-definite programming in a perhaps non-obvious way. 
The rounding algorithm of our semi-definite relaxation amounts to proving certain geometric inequalities 
which can be viewed as variants of the positive semi-definite Grothendieck inequality. This analysis is 
presented in Section |2l As a concrete example we state in this introduction the following Grothendieck-type 
inequality which corresponds to our ^ ( 1 - ^j algorithm: 

Theorem 1.1. Let {aij) be an nx n positive semi-definite matrix with YIl j=i '^ij = 0- Then for every k > 3 
andvi, . . .,vi( € S we have 

n 8 / 1 \ " 

"^^^.^ 1 Z '''^■^^'' ''■'■^ - T h ~ I , , T'', , n Z ^'V^MO' M7)>- (6) 

(,7=1 ^ ' i,j=\ 

Inequality ^ is sharp when k = 3, and we conjecture that it is sharp for all k > A. This conjecture is 
related to a geometric conjecture which we describe below. 

The Unique Games Conjecture, hardness of approximation, and the propeller problem. Our hardness 
result for kernel clustering problem is based on the Unique Games Conjecture which was put forth by Khot 
in |[T3l . We shall now describe this conjecture. A Unique Game is an optimization problem with an instance 
^ = ^(G{V, W,E),n, {7rvvy)(vv^)giv). Here G{V, W,E) is a regular bipartite graph with vertex sets V and W 
and edge set E. Each vertex is supposed to receive a label from the set {1, . . . , «). For every edge (v, w) e E 
with V e V and w e W, there is a given permutation Ky^ : {1, ...,«) — > {1, ...,«). A labeling to the Unique 
Game instance is an assignment p : V U W ^ {I, . . .,n}. An edge (v, w) is satisfied by a labeling p if and 
only if p(v) = nvw(p(w)). The goal is to find a labeling that maximizes the fraction of edges satisfied (call this 
maximum OPT(^)). We think of the number of labels « as a constant and the size of the graph G{V, W, E) 
as the size of the problem instance. 

The Unique Games Conjecture asserts that for arbitrarily small constants s,6 > 0, there exists a constant 
n = n{s, 5) such that no polynomial time algorithm can distinguish whether a Unique Games instance ^ 
with n labels satisfies OPT(^) > 1 - e or OPT(^) < OJ- This conjecture is (by now) a commonly used 
complexity assumption to prove hardness of approximation results. Despite several recent attempts to get 
better polynomial time approximation algorithms for the Unique Game problem (see the table in lH for a 
description of known results), the unique games conjecture still stands. 

Our UGC hardness result for kernel clustering, which is presented in Section [3l is based at heart on the 
"dictatorship vs. low-influence" paradigm that is recurrent in UGC hardness results (for example |[T3l[T5l ). 
In order to apply this paradigm one usually designs a probabilistic test on a given Boolean function on 
the Boolean hypercube and then analyzes the acceptance probability of this test in the two extremes of 
dictatorship functions and functions without influential variables. The gap between these two acceptance 
probabilities translates into the hardness of approximation factor. In our case, instead of a probabilistic test 
we need to design a positive semidefinite quadratic form on the truth table of the function. Our form is the 
sum of the squares of the Fourier coefficients of level 1 . This already yields | UGC hardness when k - 2. 



'As stated in |131 . the conjecture says that it is NP-hard to distinguish between these two cases. However if one only wants to 
rule out polynomial time algorithms, the conjecture as stated here suffices. 



For larger k we need to work with functions from {!,..., k}" to {\,. . . ,k]. The analysis of this approach 
leads to the "propeller problem" which we now describe. The details of this connection are explained in 
Section [3] 

We believe that one of the interesting aspects of the present paper is that complexity considerations lead 
to geometric/analytic problems which are of independent interest. Similar such connections have been re- 
cently discovered in lfT4l [8l. In our case the reduction from UGC to kernel clusterings leads to the following 
question, which we call the "propeller problem" for reasons that will become clear presently. Let y^-i de- 
note the standard Gaussian measure on R*^~\ i.e. the density of jk-i is (2;r)~'^'^~'^^^e~"''^"2/^. Let A\,. . . ,Ak 
be a partition of 1.*^ ' into measurable sets. For each i e{\,. . . ,k] consider the Gaussian moment of the set 
A;, i.e. the vector 



/ 



xdyk-\{x) € 



Our goal is to find the partition which maximizes the sum of the squared Euclidean lengths of the Gaussian 
moments of the elements of the partition, i.e. Y!1=y \\Zi\^- Let C{k) denote the value of this maximum (in 
Section ITT] we show that this is indeed a maximum and not just a supremum). In Section [3] we show that 
assuming the UGC there is no polynomial time algorithm which approximates Clust(A|/i) to a factor smaller 
than ~ / . In Section [3TT] we show that C(2) = | and C(3) = ^. The value of C(3) comes from the partition 
of the plane R^ into a "propeller", i.e. three cones of angle ^ with cusp at the origin. Most of Section lTTl is 
devoted to the proof of the following theorem: 

Theorem 1.2. C{k) is attained at a simplicial conical partition, i.e. a partition A\,...,Ak ofR'^'^ which 
has the following form: let Ai,. . . , A^ be the elements of the partition which have positive measure. Then 
Aj = Bj X K. where Bj c R'""^ is a cone with cusp at whose base is a simplex. 

It is tempting to believe that the optimal simplicial conical partition described in Theorem 11.21 occurs 
when the cones B\, . . . ,B,„ are generated by the regular simplex. However, in Section 13.11 we prove that 
among such regular simplicial conical partitions the one which maximizes the sum of the squared lengths 
of its Gaussian moments is when m = 3. 

We therefore conjecture that for every ^ > 3 an optimal partition for the problem described above is 
actually {Ci x R'^"^ C2 x R'^"^ C3 x R'^"^}, where {Ci, C2, C3) is the propeller partition of R^— see Figured] 
If this "propeller conjecture" holds true then it would follow that our ^ (l - ^) approximation algorithm is 
optimal assuming the UGC for every k > 4, and not just for k e |2, 3}. The full propeller conjecture seems to 
be a challenging geometric problem of independent interest, not just due to the connection that we establish 
between it and the study of hardness of approximation for kernel clustering. 

We end this introduction with an explanation of how our work relates to the recent result of Raghaven- 
dra iflQl which shows that for any generalized constraint satisfaction problerru (CSP) there is a generic 
way of writing a semidefinate relaxation that achieves an optimal approximation ratio assuming the Unique 
Games Conjecture. Our clustering problem fits in the framework of 1, 19 J as follows: we wish to compute 

max .; ^ aijbo-(i)a-{j) : cr : |1, . . . , n) ^ {1, . . . , fc) I , (7) 



^In a generalized CSP, every assignment to variables in a constraint has a real-valued (possibly negative) pay-ofF instead of a 
simple decision saying that the assignment is a satisfying assignment or not. 




Figure 1 : The conjectured optimal partition for the "sum of squares of Gaussian moments problem " de- 
scribed above consists of a partition of R into 3 parts, and the remaining k — 3 parts are empty. This 
partition corresponds to a planar 120° "propeller" multiplied by an orthogonal copy ofR^'^. 

where (atj) is a centered positive semi-definite matrix and (bij) is a positive semi-definite matrix. One 
can tiiink of this problem as a CSP (with an extra global constraint corresponding to the positive semi- 
definiteness) where the set of variables is {I,. . . ,n} and we wish to assign each variable a value from the 
domain 1 1, . . . , ^). For every pair (i, j) € {I, . . . ,n} x {I, . . . ,n}, there is a constraint with weight a,y. We get 
a payoff of bgt if variables / and j are assigned s € {I, . . .,k} and t e {1, . . . , ^) respectively. 

Raghavendra shows that every integrality gap instance for his generic SDP relaxation can be translated 
into a UGC -hardness of approximation result with the hardness factor (essentially) the same as the integrality 
gap. We make here the non-trivial observation that in the reduction of fT9l, starting with an integrality gap 
instance for (the generic SDP relaxation of) the clustering problem ^, the matrix of the constraint weights 
(aij) indeed turns out to be positive semi-definite as required in the kernel clustering problem (this requires 
proof — the details are omitted since this is a digression from the topic of this paper). Thus Raghavendra's 
result can be made to apply to the kernel clustering problem (i.e. the generic SDP achieves the optimum 
approximation ratio assuming UGC). 

Nevertheless, it is also useful to look at different relaxations and rounding procedures for the following 
reasons. Firstly, for a given problem there could be an SDP relaxation that is more natural than the generic 
one and might be easier to work with. Secondly, Raghavendra's result (that the integrality gap is same as 
the hardness factor) applies only when the integrality gap is a constant. This is a priori not clear for the 
kernel clustering problem. For instance, a priori the integrality gap could be Q(log n) (as is the case for 
Grothendieck problem on a general graph — see HI). So before applying the result of |[T9l . one would need 
to show that the integrality gap of the generic SDP is indeed a constant. Thirdly, for CSPs with negative 
payoffs (as is the case in the kernel clustering problem), Raghavendra shows that the value computed by 
the generic SDP achieves the optimal approximation ratio (modulo UGC), but the paper does not give a 
rounding procedure. Finally, Raghavendra's result does not really shed light on the exact hardness threshold 
in the sense that it shows how to translate integrality gap instances into a UGC hardness result, but gives 
no idea as to how to construct an integrality gap instance in the first place. Constructing the integrality gap 



instance in general amounts to answering certain isoperimetric type geometric question (naturally leading 
to a dictatorship test, or the other way round. In other words, the geometric question itself might be inspired 
by the dictatorship test that we have in mind). Thus as far as we know, we cannot avoid designing an explicit 
dictatorship test and answering an isoperimetric type question, whether or not we start with Raghavendra's 
generic SDP that is guaranteed to be optimal. As mentioned before, in the clustering problem where B = 
(bst) is centered and spherical, we show that the UGC-hardness threshold is at least ^J-. and characterizing 
C{k) seems to be a challenging geometric question. 



2 Constant factor approximation algorithms for kernel clustering 

Let A € M„(R) and B e Mk(M.) be positive semidefinite matrices. Then there are mi, . . . ,m,j € MP and 
v\,. . .,Vk e R.*^ such that atj = {uj, Uj) and bij = (v,, vy). Such vectors can be found in polynomial time 
(this is simply the Cholesky decomposition). The instance of the kernel clustering problem will be called 
centered if ^" ■ ^ a,y = 0, or equivalently 2"=i '^i - 0- The instance will be called spherical if bu = I = WvjW^ 
for all / € {I,. . . ,k]. Let R{B) be the radius of the smallest Euclidean ball containing {vi , . . . , v^). Note that 
R{B) is indeed only a function of B, i.e. it does not depend on the particular representation of B as a Gram 
matrix. Moreover, it is possible to compute R{B), and given the decomposition bij = (v,-, vy) a vector w e R*^ 
such that maxygji_ j:j ||vy - wlb - R(B), in polynomial time (see ifTOl ). 
Our goal is to compute in polynomial time the quantity: 

n n 

Clust(A|B) := max Yaijba-(i)o-(j) = max V (M,-,My)<v^(,-), v^(y)). 

(T:[l,...,n]—>[l,...,k] ^ — ' (T:{l,...,n]^{l,...,k} ^ — ' 

Our algorithm, which is based on semidefinite programming, proceeds via the following steps: 

1. Compute a Cholesky decomposition of B, i.e. vi, . . . , vj; € R.^ with bjj - (v,-, vy). 

2. Compute (using for example ifTOl ) R{B) and a vector w € R* such that 

max \\vj-w\\2 = R{B). 

je{l,....k] 

3. Solve the semidefinite program 



max I ^ aij ■ (\\w\\2U + R{B)xi, Wwhu + R{B)xj^ : m, xi, . . . , x„ 



sn+l 



A Wuh - 1 A V/ Wxih < 1 



4. Choose p,q e {I,. . . ,k} such that \\vp - VqWi = max,-_ygji_ .^^j ||v; - vy||2. Let gi,g2 e R""*^^ be i.i.d. 
standard Gaussian vectors and define cr :{ 1, ...,«) ^ 1 1, ..., ^) by 



cr(r) 



p ii{gl,Xr}>{g2,Xr}, 
q if{g2,Xr}>{guXr}. 



(8) 



5. Choose distinct a,p,y e {1, . . . , ^) such that 



v„ - 



Va+V/j + Vy 



Vp- 



Va+V/s + Vy 



+ 



Va + V/S + Vy 



is maximized among all such choices of a,fi,y. Let g\,g2,g3 £ R"^^ be i.i.d. standard Gaussian 
vectors and define r : 1 1 , . . . , «) ^ { 1 , . . . , ^) by 



a if (^1 , Xr) > max {{g2, X^), {g3, Xr)} , 
T(r) = -j /? if {g2, Xr) > max {(gu^r), {g3, ^r)} , 

y if (g3,x^> > max{<^i,x^>,<g2,-^r»- 



(9) 



6. Output a-if i;"^.^j aijhcr(i)a-{j) > Z"y=i aijbr{i)r{j)- Otherwise output r. 

Remark 2.1. The astute reader might notice that there is an obvious generalization of the above algorithm. 
Namely for every fixed integer s € [2,k] we can choose a subset S c {1,..,^} of cardinality s which 
maximizes the quantity 



res 



'^--^'J 



;65 



{1,...,^} 



Then, we can choose s i.i.d. standard Gaussians {gi}ies £ R-""*"' and define cTs : {!,...,«) 
analogously to the above, namely crs{r) = i if 

{gi,Xr) = mSiX{gj,Xr). 

Then, we can consider the assignments cr2, 0-3, . . . , 0-^ and choose the one which maximizes the objective 
^0=1 ^ipcri(i)(Te(i)- ^"^ ^pi'^^ ^^ "^bis flexibility, it turns out that the the rounding method described above 
does not improve if we take i^ > 4. In order to demonstrate this fact we will proceed below to analyze the 
algorithm for general s, and then optimize over s. 

Bounds on the performance of the above algorithm are contained in the following theorem: 

Theorem 2.1. Assume that A is centered, i.e. that YIj ,=1 ciij - 0. Let p, q, a,/3, ye {I,. . . ,k} and vi , . . . , v^t 
be as in the description above. Then the algorithm outputs in polynomial time a random assignment A : 
{!,...,«) — > {\,. . .,k} satisfying 



Clust(A|B) 

InRiBf 



< mm 



\6nR{BY 



Vn - Va 



In particular we always have 



Vg+Vfi+Vy I 



\\V/}' 



Vg+V/i+Vy I 



I2 + Irr ■ 



Va+Vl3+Vy\ 



aust(A|B)<7r(l-- 



2_j aijbA(i)A(j) 



and ifB is centered and spherical, i.e. X, ,=1 bij = and bu = I for all i, then 



Clust(A|B) < y (1 - ^ 



2_j aijbA(i)A(j) 
',7=1 



2_j aijbA(i)AU) 



(10) 



(11) 



(12) 



The same bound in (1121 ) holds true ifB is the identity matrix. 



We single out in the next theorem the case k e [2, 3), since in these cases we have matching UGC 
hardness results. Note that for general k we obtain a factor n approximation algorithm, answering positively 
the question posed by Song, Smola, Gretton and Borgwardt in [23|. 

Theorem 2.2. Assume that A is centered and B is a 2x2 matrix. Then our algorithm achieves a | ap- 
proximation factor Assuming the Unique Games Conjecture no polynomial time algorithm achieves an 
approximation guarantee smaller than | in this case. 

Assume that A is centered, k = 3 and B is centered and spherical (since k = 3 this forces B to be 
the Gram matrix of the-degree three roots of unity in the complex plane). Then our algorithm achieves 
an approximation factor of ■^. Assuming the Unique Games Conjecture no polynomial time algorithm 
achieves an approximation guarantee smaller than -^ in this case. 

In fact, we believe that the UGC hardness threshold for the kernel clustering problem when A is centered 
and B is spherical and centered is exactly 

Stt/ V 



-I'-i 

In Section[3]we describe a geometric conjecture which we show implies this tight UGC threshold for general 
k. 

We end the discussion by stating a (probably suboptimal) constant factor approximation result when A 
is not necessarily centered (note that this case is of lesser interest in terms of the applications in machine 
learning). In this case the above algorithm gives a constant factor approximation. The slightly better bound 
on the approximation factor in Theorem [23] below follows from a variant of the above algorithm which will 
be described in its proof. 

Theorem 2.3. For general A and B (not necessarily centered) there exists a polynomial time algorithm that 
achieves an approximation factor of 



1 + 



2n 



max 

ie\\,...,k\ 



V„ + Vfl 



2 , 3;: 

2 2 



The proof of Theorem 12.21 is contained in Section [31 We shall now proceed to prove Theorem 12.11 
Before doing so we will show how the general bound in (ITOl ) implies the bounds ([TTI ) and ([12] ). The proof of 
Theorem 12. 3 1 is deferred to the end of this section. 

To prove that ([TO] ) implies ([TT] ) let D denote the diameter of the set {vi, . . . , Vk), i.e. D - \\vp - Vq\\2. A 
classical theorem of Jung II12II (see |(7l) says that 



R{B) < D 



k- 1 
2k ' 



and ([TT]) follows immediately by taking the first term in the minimum in ([TO]) . 

We shall now show that ([TOt implies ([12] ) when B is either centered and spherical or the identity matrix. 
Assume first of all that B is centered and spherical. Note that since vi, . . . , v^ are unit vectors, R{B) < 1. 
Hence, by considering the second term in the minimum in ([TOt we see that it is enough to show that there 
exist a,p, y € 1 1, . . . , ^) for which 



v„ - 



Vq, + V^ + Vy 



+ 



Vp- 



Va + Vp + Vy 



Va+Vp + Vy 



2 2k 

> 

2~ k-\ 



This follows from an averaging argument. Indeed, 



[3) a,IS,ye{l,...,k] 
a<p<y 



Va+Vp + V^ 



Vp- 



V„ + V/J + Vy 



Va + Vp + V-y 



J- h { k 

2 A. rt rt 1^ rt 1^ 



!=1 



/t(A: - 1) ^ ' " J' k^ " k{k - 1) 



i=l 






2k 
k- 1 



This complete the proof of (fT2l ) when B is spherical and centered. The same bound holds true when B = I^is 
the identity matrix since in this case if we denote by ei , . . . , e^ the standard unit basis of 1.*^ and e - ^ Z/=i ^i 
then for every assignment A : [I,. . . ,n} ^ {I,. . . ,k} we have 



2_j <^ij(h)A(i)A(j) - 7 ^{u-i,Uj){ex(i),ex(j)) 



U=i 



'j=i 



n I n n \ k 

^ {ui, Uj)(eA(i) - e, exu) - e> + 2 /^ m, ^(e, ex(j))uj\ - \\e\\l ^ m; 



(13) 



The last two terms in (fT3] ) vanish since A is centered. Thus 

k- 1 



7 , aij{Ik)A(i 



UU) 



'J=i 






0-O(0i0> 



where C = (c,^) = ^((e,- - e, ey - e)) is spherical and centered. Thus the case of the identity matrix reduces 
to the previous analysis. 



Proof of Theorem \2.1\ Denote 



SDP := max ^ a;y • (||w||2M + R{B)Xi, \\w\\2U + R{B)xj) , 

where the maximum is taken over all m, xi, . . . , x„ e W'^^ such that ||m||2 = 1 and ||;ic,||2 < 1 for all /. Observe 
that 



SDP > Clust(A|B). 
Indeed, for every A : [I,. . .,n} ^ {I,. . .,k} define u = tAt- and x, = -^^y and note that in this case 



(14) 



^ aij ■ {Wwhu + R{B)xi, \\w\\2U + R{B)xj^ = ^ aijbA(i)A{j)- 
Let M*, X* . . . , X* be the optimal solution to the SDP. It will be convenient to think of the SDP solution 



10 



as being split into two parts. So we rewrite 

n 

SDP = Y, «'7 • (ll>^ll2"* + RiB)x*, Wwhu + R{B)x*^ 

n 

= Y, <"'■' "i> • <ll^ll2«* + ^(^)^*' \M\2U* + R(B)x*) 

n 

Y^Ui®i\\w\\2U+R{B)x*) 



i=l 



(15) 



1MI2 2"' ®" i + U(fi)j]"<®^; 



^i=l 



(=1 



WP+QWl 



where 



P-WwhY.'^i^u*, 



(16) 



(17) 



i=i 



and 



Q-RiB)J]ui^x*. 



(18) 



(=1 



Observe in passing that (1161) implies that the objective function of the SDP is convex as a function of 
u,x\,...,x„, and therefore we may assume that ||m*||2 = 1 and ||x*||2 = 1 for all /. 

We shall now proceed with the analysis of our algorithm while using the variant described in Remark ITT] 
This will not create any additional complication, and will allow us to explain why there is no advantage in 
working with subsets of size s > 4. Recall the setting: for a fixed integer s e [2,k] we choose a subset 
S c { 1, .., ^) of cardinality s which maximizes the quantity 



ieS 



'^-lI^'J 



'ts 



Then, we choose s i.i.d. standard Gaussians {gi\ies £ 1^"^' and define cr : |1, . . . , «) ^ {1, . . . , /c) by setting 
cr(r) - i if 

{gi,x*^) = mdix{gj,xl). 

Fix /, 7 e { 1, . . . , «). As proved by Frieze and Jerrum in [3 (see Lemma 5 there), we havqj: 

00 
Pr [cr(/) = cT{j)] = J] Rm{s){x1, x})™, 

m=0 



'We are using here the fact that x[,..., x* are unit vectors. 
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where the power series converges on [-1, 1] and all the coefficients Rm(s) are non-negative. Moreover 
Rois) = ^ and 



Riis) 



1 



s- 1 






IJ-oo / 



Note that conditioned on the event cr{i) = o-{j), the random index cr(/) is uniformly distributed over S . 
Also, conditioned on the event cr{i) i= crij), the pair {cr{i),o-{j)) is uniformly distributed over all s{s - 1) 
pairs of distinct indices in S . Thus 



WbaiiMj)] = Merit) = (r(j)] 



tZ^^^ 



(eS 



+ Pr[cr(0^cr(7)] 



^Z 



sis -I) 



e,teS 



Denote "t = -j ^^g^ hu and Y = . _j. Iif,re5 ^fr- (note that O, *F depend on the matrix B as well as the 



.v(i-l) 

choice of the subset S £ { 1, . . . , fc). Thus 



l+t 



]E[^o-(0o-(;)] 



2/?„W(x;,x}>" 






\m=0 



^ 



1- J]/?,„(^)(x;,x}>" 

m=0 

^ (xp + (o - 4')7?o(^)) + (O - q') 2 Rmis){x\,xy\ (19) 



m=l 



Write V := ^ Yii&s "^t- Observe that 



Y + ((D_«I/)/?0(^)^||v||2. 



(20) 



Indeed, since Rq{s) - \/s we have 



xp + (O - >P)/?o(^) - 1 1 - - 1 



CjeS 






= V 



V i. 



Moreover, 

{s - i)(a) - ^F) - 2 11^^ 

In particular <1) - 4' > 0. To prove (|2TI ) we simply expand: 

Z 11^^ - ^Il2 = Z 11^^112 - ^I|V|I2 = ^O - - 2] Z.,, - ^d) - - (^O + S{S - \m = is- 1)(0 - ^). 



(21) 



feS 



teS 



e,teS 



Multiplying both sides of equation ( fT9l ) by a;y and summing over i,j e {1, . . . , «} while using (l20l) we 
get that 



2^ aijba-(i)o-(j) 



= V 



J^ a,j + (d) - »P)/?i(5) J^ atjix*, X*) + (O - ^) £ /?,„(5) Y, aij{x*,x)r. (22) 



;,;■=! 



i,;=i 



m=2 JJ— 1 



12 



Note that for every m > 1 we have 

n n n 

Plugging ( [231 ) into (l22l ). and using the fact that O - T > and the positivity of Rm(i'), we conclude that 



Kim 



>0. 



(23) 



2_j aijba-{i)a-(j) 



> llvll^ Z ^'V + ^"^ - "^^^1^^^ Z «0<^;'^}>- 



(24) 



We shall now use the fact that Yj" j=i ^ij - for the first time. In this case P - (see equations ( fT6l ) 
and ^}) so that 






(25) 



Hence, using (11^ and dlUl ) we get the bound 



2j aijba-(i)o-(j) 



> 



{s - 1)/?(B)2 



SDP > 



mRi{s)i:ees\\vt-v\\l 



is - 1)/?(B)2 



Clust(A|fi). 



(26) 



The term /?i(5) is studied in Section [3?T1 where its geometric interpretation is explained. In particular, it 
follows from Corollary 13.61 and Corollary 13.41 that Ri{s) < /?i(3) for every s > 4 and that /?i(2) = - and 
/?i(3) = ^. Hence the cases s e {2, 3) in (l26l ) conclude the proof of Theorem 12.11 Moreover, we see that 
for 5 > 4 the lower bound in ( |26l ) is worse than the lower bound obtained when case s = 3. Indeed, we have 
already noted that in this case Ri{s) < /?i(3). In addition. 



m=3 



teT 2 ^ CeS 



CjeS 



eeS 



Vt 



teS 



This implies that there exists T Q S with |r| = 3 for which 

2 



2Z ^^-3^^' ^tttZii^^ 



v\\l 



teT ~ teT 2 " teS 

so that when s >4 the lower bound in (l26l ) is inferior to the same lower bound when 5 = 3. n 

It remains to deal with the case Yj" j=i ^ij > 0» i-C- to prove Theorem l2.3l 

Proof of Theorem \T3\ We slightly modify the algorithm that was studied in Theorem 12. II Let vi, . . . , vj and 
p,q ^ {1, . . . , ^) be as before, that is bij - (v,-, vy) and \\vp - Vq\\2 = max,_ygji ..j-) ||v; - Vj\\2 = D, the diameter 
of the set |vi, . . . , v^tl € R*^. Denote w' := ^ and 



R'{B) := max v; - w' 

ie{l,...,k}" ' 
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We now consider the modified semidefinite program 

n 

SDP ■- max J] a,^ • (Hw'IbM + R'{B)xi, Ww'hu + R'{B)xj) , 

wliere the maximum is taken over all m, xi, . . . , x„ e R"^^ such that ||m||2 = 1 and ||x;||2 < 1 for all /. From 
now on we will use the notation of the proof of Theorem 12.11 with w replaced by w' and R(B) replaced by 
R'{B) (this slight abuse of notation will not create any confusion). As before, we let gi,g2 e R""*"^ be i.i.d. 
standard Gaussian vectors and define cr :{ 1 ,...,«}—>{ 1 ,..., ^} by 



cr{r) 



(27) 



p ii{gl,Xr}>{g2,Xr}, 
q if{g2,Xr)>{gl,X,-). 

Note that the first place in the proof of Theorem 12.11 where the assumption that A is centered was used in 
equation (1251) . Hence, in the present setting we still have the bounds 

Clust(A|B) < SDP = \\P + QWl < {\\P\\2 + WQhf , (28) 

where P and Q are defined in (fTT] ) and (fTSl ) (with w and R{B) replaced by w' and R'{B), respectively). Also, 
it follows from (|24l) that 



2j aijba-(Dcr(j) 



> iMii 



2 «'7 + (ll 



Vp - v\\l + \\v^ - v\\j 



)/?i(2) 2_^ aij{x*,x*), 



(29) 



where v = -^^-^ 



w'. Note that \\vp - v\\l + \\vg - v\\l = ^, and recall that /?i(2) = ^. Thus dm becomes: 

(30) 



2^ aijbo-(i)a-(j) 



n j^2 " 



2;r . . , 



Note that 



\\p\\l = w^'wl 



!=1 



M; » M 






and 



iieii^ - R'iBf 



Y,^i®x* =R'{BfYjaij(x*,x*). 

i=\ 2 '7=1 



(31) 



(32) 



Combining (l28]l and dHOjl with dlB and ([321) we see that 



Clust(A|B) < 



i\\p\\2 + iieibr 

l|P2ll^ + c||e||2 



2_j aijba-(i)a-U) 



(33) 



where c = y^TW- "^^^ convexity of the function x — > x^ implies that 



(\\Ph + WQhf 



c c + 1 



c + 1 c 



11/^112+ (i-^)(c+i)ii(2ii 

I|/'II2 + (C+I)ll!2ll2 = (l + -)(|I^II2+C|ieil2)- 



C+1. 



14 



Thus (l33l ) implies that our algorithm achieves an approximation guarantee bounded above by 



1 InR'iBf ^ 

1 + - = 1 + -^ = 1 + 

c D2 



In 



■ max 

\Vp-Vq\\l '-61 !,...,« 



V; - 



Vp + Vq 



It remains to note that for every / e {\,...,k} we know that ||v,- - Vp||2,||v,- - Vq\\2 < D and therefore 
\\vi - vv'lb < ^D. This imphes that our approximation guarantee is bounded from above by 1 + y. n 

3 UGC hardness 

3.1 Geometric preliminaries: Propeller problems 

Let y„ be the standard Gaussian measure on W. For any integer k>2 define 



C{n,k) ■- sup -^ V I xfj{x)dyn{x) 

11^ Jr" 



/!,...,/,€ L2(r„) A Vj/y>0 A Y,fi-\ 



(34) 



We first observe that the supremum in (l34l l is attained at a A;-tupIe of functions which correspond to a partition 

ofR": 

Lemma 3.1. There exist disjoint measurable sets A\,. . . ,Ak c R" such that Ai U A2 U • • • U A^ = R." and 



k ^ 

^ I Xdjnix) 



= C{n,k). 



Proof. Let H be the Hilbert space L2(y„) © L2{y„) ® • • • © L2(y„) {k times). Define K QHtohe the set of all 
{f[, . . . ,fjc) e H such that /, > for all j and '^)=i fj ^ ^- Then K is a closed convex and bounded subset of 
H, and hence by the Banach-Alaoglu it is weakly compact. The mapping tp : K ^ R given by 



k „ 2 ^ " 

lAC/i' ■ • ■ > A) := y I xfj{x)dy„{x) 



j=i 



k n I „ 

;_1 .--i \^K 



(x)dy„(x) 



j=i i=i 



is weakly continuous since the mapping (xi, . . . ,x„) — > xj is in L2(y„) for each j. Hence t/r attains its 
maximum on K, say at (/*,..., /^*) e A'. 

Define Zj ■- j^„ xf*{x)dyn{x) € W and let 



Note that 



w 






7W 



dynix). 



1 ^ 
(=1 



J] Ikill2 + lk< + w|| 



i<y<*: 
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which impUes the existence of / e { 1, . . . , ^) for which 



2 \\zj\\\ + \\zi + M\l>Yj\\^i\\i 



\<i<k 



Hence, if we define for j e {\,. . . ,k}. 



Sj 



..- fj 



j*i 



/r + i-i:'=i/; j = i 



then (gi 



€ K, and 

k 



C{n,k)>\ I xgj(x)dyn(x) 



Y, WZjWl + WZi + w\\j > J] WzjWl = C{n,k). 

7=1 



1 <;•<«: 

J*' 



So 



k ^ 



x)dyn{x) 



C(n,k). 



Note that X/=i 8j = 1. so we can define a random partition Ai, . . . ,Aic of R" as follows: let {i'xLeR" be 
independent random variables taking values in {1, . . . , ^} such that Pr(sx = j) = gjix), and define Aj := {x e 
^" '■ ^x = j]- Then by convexity and the definition of C{n, k) we see that 



V I xdjnix) > V I {ElAj{x))xdy„{x) ^ V I x^/ 
jr^ Ja; 2 j=i ^^" 2 jr^ Jr" 



{x)dy„{x) 



C{n, k). 



It therefore follows that there exists a partition as required. n 

Lemma 3.2. Ifn >k-\ then C{n, k) = C{k - 1, ^) and ifn < k - I then C{n, k) = C{n, n + 1). 

Proof. Assume first of all that n > k - I. The inequality C{n,k) > C{k - l,k) is easy since for every 
/i, . . . ,fk € L2iyk-i) which satisfy fj > for all j e {I,. . .,k} and fi+---+fk < 1 we can define 

7i, . . .^ : R« - R'^-'xR"-''^' ^ Rhy fj{x,y) - //x). Then/i, . . .^ € L2iyn)Ju . . .fk>Oji + - ■ ■+fk < I 

and X^^j ||^,_, x//x)<iy^_i(x)||2 = Z^^j j^„ xfj{x)dyn{x) ^. In the reverse direction, by Lemma O there 
is a measurable partition A[, . . . ,Ak of R" such that if we define Zj '■- L xdyn{x) e R" then we have 
Z%i \\zj\\l = C{n,k). Note that 



V Zj = I V Iaj xdyn(x) ^ I xdynix) ^ 0. 



7=1 "'"■ V7=l 

Hence the dimension of the subspace V := span|zi , . . .,Zk} i^ d < k- I. Define gi , . . . , g^ : V — > [0, 1] by 

eM) = yv- {{Aj -x)n V^) . 
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Then g\ + ■ ■ ■ + gk 



1, so that 

k 



C{k-\,k)>C{d,k)>Y^ \ xgj{x)dyv{x) =^ I \1aj{x + y)xdyv{x)dyv^{y) 



k 

7=1 



/ 

JAj 



Projy(w)(5fy„(w) 



7=1 7=1 



We now pass to the case n < k - \. The inequahty C(n,n + 1) < C{n,k) is trivial, so we need to show 
that C{n, k) < C(n, n + I). We observe that since fc > « + 1 for every vi, . . . , v^^ e K." there exist two distinct 
indices i,je{l,...,k} such that (v,, vj) > 0. The proof of this fact is by induction on n. If n = I then our 
assumption is that ^ > 3, and therefore at least two of the real numbers v\,. . .,Vk must have the same sign. 
For « > 1 we may assume that {vi,Vj) < for all j > 2 (otherwise we are done). Consider the vectors 

vi \ , i.e. the projections of V2, . . . , Vk onto the orthogonal complement of vi. By induction 

2 J 7=2 



l|vil|2 

there are distinct /, j e {2, . . .,k] such that 

(vi,v,-> 



0< V,- 



llvill 



(VuVj) 
IIVlllo 



(VmVi)(V;,Vi) ^, 

llvill. 



Now, let A[, ... ,A^ be a partition of R" as in Lemma [37T] and denote Zj '■= j. xdjnix) € R.". By the above 
argument there are distinct /, 7 € {l,...,k} such that {zi, zj) > 0. Hence 



C{n,k-l)> 



^2 2 

2 j xdy„{x) + I xdynix) = ^ Mlj + \\Zi + ZjWJ 



H[i,j\ 



l<f<k 



2 Ik 



e\\\ 



C(n,k)>C{n,k-\). 



So, C{n, k) = C{n, ^ - 1), and the required identity follows by induction. 



In light of Lemma 13^ we denote from now on C{k) := C{k - \,k). Given distinct zi, ■ 
j e{\,...,k} define a set Pj{zu . . . , z/:) c R*^ by 



max {x,Zi) 
ie{l,...,k} 



■ ,Zk e 



l''-^ and 



Pjizi,...,Zk):=\xeR': {x,Zj) 



Thus \Pj{z\, ■ ■ ■ ,Zk)\ ._ is a partition of R*^ ' which we call the simplicial partition induced by z.\,- ■ ■ ,Zk 
(strictly speaking the elements of this partition are not disjoint, but they intersect at sets of measure 0). 

Lemma 3.3. Let Ai, . . . , A^ c R*^"^ be a partition as in Lemma 1X7] i.e. if we set zj '■= J. xdyk-iix) then 

C{k) - Yji=i \\Zj\^- Assume also that this partition is minimal in the sense that the number of elements of 
positive measure in this partition is minimal among all the possible partitions from Lemma UJ] By relabeling 
we may assume without loss of generality that for some 1 < £ < k we have yk-i{Ai), . . . ,yk-i{A£) > and 



tC-l 



that 7i_i(Af+i) = • • • = yk-i{Ak) - 0. Then up to an orthogonal transformation zi, . ■ . ,Zt £ 
distinct i, j e {I, ...,£} we have (zt, Zj) < and for each j e {I, ...,{] we have Aj = Pj{z\, . . . , z/) x 
to sets of measure zero. 



for any 



ik-e 



up 
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Proof. Since 1^, + • • • + 1^^ - 1 almost everywhere we have z\ + ■■■Zc - 0. Thus the dimension of 
the span of zi, . . . ,Zf is at most i - I, and by applying an orthogonal transformation we may assume that 
zi, . . . ,Zf € K.^"^ Also, if for some distinct /, j € {\,. . .,{} we have {zi,Zj) > we may replace A, by A, U Ay 
and Aj by the empty set and obtain a partition of R*^"^ which contains exactly i - \ elements of positive 
measure and for which we have: 



\<r<k ^^r 2 Ja.UAj 2 l<r<k r=l 



muj] 



muj] 



This contradicts the minimality of the partition Ai, . . . , A/;. 

Note that the above reasoning implies in particular that the vectors zi, - ■ - ,Zf are distinct, and therefore 

[Pjizi, . . . ,Zf) X MJ^'^] ._ is a partition of R*^"' (up to sets of measure 0). Assume for the sake of contradic- 
tion that these exist / € {I, . . .,{} such that 

n_i(AA(/'/zi,...,z^)xR^-'))>0. 

Note that up to sets of measure we have: 

Ai \ [Pjizi ,...,ze)x R^-^) = [j Q |x e A,- : (x, Zj) > (x, zd + ^| . 

J*' 

Hence there exists m > and j € 1 1, . . . , ^) \ {/) such that if we denote E := Ix e A, : (x, Zj) > (x, Zi) + ^ 
then -yt- i{E) > 0. Define a partition A i , . . . A^; of R*'" ^ by 

(A,. r^{i,j} 

Ar:= I Ai\E r = i 
[aj^E r = j. 

Then for w := J^ xdyk~\{x) we have 



C{k)> 



k 

§11 



2 k 

xdyu-i{x) = Y, \\Zr\\l + \\Zi-w\\l + \\zj + w\\l = Y,\\Zr\\l + 2\\w\\l + 2{zj,w)-2{zi,w) 

2 l<r<k r=l 

riiij] 

> C{k) + 2\\w\\l + 2 J (izj, X) - (zi, x}) dyk-i(x) > C{k) + ^^^^^ > C{k), 



a contradiction. 



Corollary 3.4. We have C(2) = ^ and C(3) = ^. 

Proof. Note that Lemma [331 implies that for each k > 2 there exists a partition Ai, . . . ,A,t of R*^~^ such that 

each A; is acone and C(fc) = X;-i L xdyk-\{x) . When /: = 2 the only such partition of R consists of the 
positive and negative half-lines. Thus 



C(2) - 2 



/ 1 r^^-.v= 

\ V2^ Jo 



l^dx\ = -. 

n 
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When k = 3 the partition Ai, A2, A3 consists of disjoint cones of angles a\,a2, 0^3 € [0, 2;:], respectively, 
where a\ + 02 + a^ - 2n. Now, for j € 1 1, 2, 3) we have 



r- ^1 r°° r'']!'^ 

\ xdy2{x) = — I I < 

JAj 2 ^0 J-ffj72 



e'^^r^e-'^l^drdG 



^ _ sin2(ay/2) 



Hence 



f 9 9 9 1 

C(3) = — max sin (Qfi/2) + sin (CK2/2) + sin (a^/l) : a\,a2,aT, e [0, tt] A a\ + a2 + ai, = In] 

In ^ ' 



Tn ■ "" I3 



8;r 



, (35) 



where (1351) follows from a simple Lagrange multiplier argument. 



It is tempting to believe that for every k > 2, C{k) is attained at a regular simplicial partition, i.e. a 
partition of R*^"^ of the form {Pjivi, . . . , Vk)}^.^^ where vi, . . .,Vk are the vertices of the regular simplex in 
R.*"^ This was shown to be true for k e {2, 3) in Corollary 13.41 We will now show that this is not the case 
for k>4. 

Lemma 3.5. Let vi , V2, . . . , v^ e R.^"^ be vertices of a regular simplex in R*~^ i.e. for each i e {I, . . .,k} we 
have ||v,||2 = 1 and for each distinct i,j€.{l,...,k}we have (v,, vj) = - jty- Let 



-'L 



xdyk-\{x). 



Pi(Vl,...,Vk) 



Then 



Z Il^'ll2 



i=\ 



k-\ 



max gj 



where g[,g2, ■ ■ ■ ,gk fl^re independent standard Gaussian random variables. 

Proof. By symmetry all the zi have the same length r > and z,- has the same direction as v,-. Thus for all / 
we have (z,-, v,) = r. Now, 



k k ^ 

,=1 i=i -'^'("1 



{x,Vi) dyk~i{x) 



■,n) 



k ^ 



max (x,Vj}\dyk-].{x) 

.^vj) Ve{l,...,<:| ' 



-f ( 



max (x,Vj}\dyk-[{x) = 

I \je{l,...,k] ' 



max h, 

je{\,...,k} 



where hi,. . .,hk are standard Gaussian random variables with covariances E[/i,/jy] - {vi,Vj). Let /j be a 
standard Gaussian which is independent oih\,...,hk. Then 



max h 



h 



^Ik^ \;e|l,...,i| 



+ max h 



max 



h 



je[U..,k]\^|k^\ 



+ h 



max h 

je{L...,k] 



(36) 



where we set hj :- h^ +hj so that hj are independent Gaussians with mean zero and variance ^. The last 
term in (l36l) is same as ^i■j^ ■ E maxygj 1 j.j gj where gi,. . .,gkaie independent standard Gaussians. n 
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Corollary 3.6. For k >2 denote 
1 



R{k) := 



k-\ 



max gj 



1-~iiwSj''''i!yH^''- 



Then for every integer k e {2,4,5,. ..] we have R{k) < R{3) = ^. Thus, j/ v^ , . . . , v^ are the vertices of the 
regular simplex in M. then for k > 4we have 



k 



7=1 ^PM'-A) 



xdyt~-i{x) 



< 

2 7=1 



3 „ 



IPj(v],vl,xi 



xdyk-i(x) 



Proof. It follows from Corollary 13.41 that R{3) - C(3) = ^. We require a crude bound on R{k). An 
application of Stirling's formula shows that for /? > 2 we have 



(BO.n)"-(fr(^)f%Vf 



Hence 



R{k)< 



k-\ 



;X!J^^' 



k- 1 



( k \^IP 

;=i 



( k 



k- 1 



\2/p 



U=i 



<^.k'lp.p-. 

k-\ 2 



Choosing p = 2 log k>2 log 4 > 2 we see that 



R{k)< 



elogk 
k- 1 ■ 



(38) 



The function k -^ -^^ is decreasing on [4, oo), and therefore a direct computation using ( [38] ) shows that 
R{k) < ^fox: k > 26. For k <25 one can compute numerically (say, using Maple) the integral in (|37] ) and 
get the following values: 

R{A) = 0.3532045529, R(5) = 0.3381215916, /?(6) = 0.3211623921, /?(7) = 0.3047310600, 
/?(8) = 0.2895196903, /?(9) - 0.2756580116, /?(10) = 0.2630844408, /?(11) = 0.2516780298, 
R{U) - 0.2413075184, /?(13) = 0.2318492693, /?(14) - 0.2231929784, /?(15) - 0.2152425349, 
/?(16) - 0.2079150401, /?(17) = 0.2011392394, /?(18) - 0.1948538849, /?(19) - 0.1890062248, 
R(20) = 0.1835506894, R{2\) = 0.1784477705, R{22) = 0.1736630840, R{23) = 0.1691665868, 
R(24) = 0.1649319261, /?(25) = 0.1609358965. 



Since /?(3) = 0.3580986219 it follows that R{k) < R{3) for every integer k e [4, 25] as well. 



D 



We conjecture that C{k) < C(3) for every integer k > 2. For future reference we end this section with 
the following alternative characterization of C(^): 



Lemma 3.7. We have the following identity: 

s2 



(E[maxygji,...,,t|gyl) 
C(k) = sup { — : (gi , . . . , gic) € R mean zero Gaussian vector \ . 



(39) 
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Proof. First we show that C{k) is at most the right hand side of ( [39l ). We know that there exists a partition 
A\,. . .,Akoi R*^"' such that if we write zi '■= L x dyk~i{x) then Aj = Pjizi, • • • , Z/t)for all 7 € { 1, . . . , ^) and 

C{k) = Y,)^, WzjWl Now, 



cw - y ik7ii2 - y r {x,zj)dyk-i{x)^y. r 

vf ( 



max <x,z;>|<i7<:_i(x) 
(z,,...,zt)Ve|l,...,^) 



max <x,z;> JyA:-i(x) ^ 

'•e{l,...,M 



max h, 

je[Y,...,k] 



(40) 



where in (l40l ) hi, . . .,hkaie mean-zero Gaussians with covariances E[/j,/jy] = {Zi,Zj)- Thus 



C(^) = 



(E[maXyg|i,...,jt) /jy]) 



which implies the desired upper bound on C{k). 

For the other direction fix a mean zero Gaussian vector {g\, ■ ■ ■ ,gk) £ R-*^ and let vi , V2, . . - , v^ € M.*^ be 
;tor 
Now, 



vectors such that E[^,gy] = (v,-, vy> for all /, 7 e {1,...,^}. For / e {1, ...,/:) let w, '■= ^pj^, ^,)X dyk-\{x). 



\ 



= y I max {x,Vj)\dyk~\{x)= I max {x,Vj)\dyk-\{x) = E 

-^ Jp,(vi,...,v*)Vefl.-.*^l / J^t-i\M\,...,k] 'I 



max gy 



Therefore, 



C(^) > J] ||w,| 



> 



(E[maxyg|i,...,i.| gjj^ (E[max 



jell,...,k] gj 



]f 



(=1 



2;., Ilv^ll 



2?=.eK1 



This completes the proof of 



3.2 Dictatorships vs. functions with small influences 

In what follows all functions are assumed to be measurable and we use the notation [k] := [I,. . . ,k]. In this 
section we will associate to every function from {I, . . .,k}" to 



Ak'-lxeR'' : a-,- > A V / e [k], _^ -^z ^ 1 



a numerical parameter, or "objective value". We will show that the value of this parameter for functions 
which depend only on a single coordinate (i.e. dictatorships) differs markedly from its value on functions 
which do not depend significantly on any particular coordinate (i.e. functions with small influences). This 
step is an analog of the "dictatorship test" which is prevalent in PCP based hardness proofs. 
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We begin with some notation and preliminaries on Fourier-type expansions. For any function / : 



Ai we write f = {fi,f2, ■ ■ ■ ,fk) where fi 



[0, 1] and Yji=\fi ^ 1- With this notation we have 



C{k) = 



f- 



sup 



k 

Z 

1=1 



f 



xfi{x)dyk-]_{x) 



where C{k) is as in Section [3TT] We have already seen that the supremum above is actually attained and at 
the supremum we have Xf=i fi = 1- Also C{k) remains the same if the supremum is taken over functions 
over R" with n > k - I, i.e. for every n > k - I, 

C(k) = sup V I xfi{x)dy{x) 

Let (Q = [k],ju) be a probability space, /j. being the uniform measure. Let {Q.",iu") be the product space. 
We will be analyzing functions f : Q." ^ A^ (and more generally into Mf^). Fix a basis of orthonormal 
random variables on Q. where one of them is the constant 1, i.e. {Xo,Xi, . . . ,Xk-i} where V /, X, : O ^ K., 
Xo = 1 and E(jgn[X,(a>)Xj(w)] = for / t j and equal to 1 if / = / Then any function / : O ^ R can be 
written as a linear combination of the X,'s. 

In order to analyze functions / : O" — > R, we let X = (/Yi, ^^2, . . . ,?(„) be an "ensemble" of random 
variables where for I < i < n, Xj = {X,-o,X,j, . . . ,X,-jt-i), and for every /, {X,';}*:~q are independent copies 

of the {Xj}^.Zq. Any cr = (o"i, 0-2, . . . , o"„) e {0,1,2, .. .,k- I}" will be called a multi-index. We shall denote 
by |cr| the number on non-zero entries in cr. Each multi-index defines a monomial Xq- := Ylieinio-ii^o ^Lo-j on a 
set of n{k - 1) indeterminates {xij \ i e {n\,i e {\,2,...,k- 1}), and also a random variable Xo- : O" ^ R as 



i=\ 

It is easy to see that the random variables {Xo-jo- form an orthonormal basis for the space of functions 
/ : O" ^ R. Thus, every such / can be written uniquely as (the "Fourier expansion") 



/-2]/(^)X^, /(^)e 



We denote the corresponding multi-linear polynomial as Qf = Xo- /(c")-^o-- One can think of / as the polyno- 
mial Qf applied to the ensemble X, i.e. / = Qf{X). Of course, one can also apply Qf to any other ensemble, 
and specifically to the Gaussian ensemble Q = (^1,^2. ■■ ■ ,0n) where ^,- = {Gift = l,G;j, . . . ,Gi^k-i} and 
Gij, /€[«], 1 < 7 < ^ - 1 are i.i.d. standard Gaussians. Define the influence of the /'th variable on / as 



Inf,(f ) := Y, /(^)' 



cr,^0 

Roughly speaking, the results of 11211 [T6l say that if / : O" ^ [0, 1] is a function with all low influences, then 
/ = Qfi'^) and QfiQ) are almost identically distributed, and in particular, the values of Qf{Q) are essentially 
contained in [0, 1]. Note that Qf{Q) is a random variable on the probability space (R"(*^~'\ ynik-i))- 

Consider functions / : Q" ^ A^. We write / - (/i,/2, ...,fk) where / : Q" ^ [0, 1] with ff^^ ft < 1. 
Each fi has a unique representation (along with the corresponding multi-linear polynomial) 



f = Y, f(^)X<r' Qi ■= Qf, - Yj f'^"^^' 
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We shall define an objective function OBJ(/) that is a positive semi-definite quadratic form on the table 
of values of /. Then we analyze the value of this objective function when / is a "dictatorship" versus when 
/ has all low influences. 

The objective value 

For a function / : O" ^ Aj^ (or more generally, f : Q." ^ B}) define 

k 
0BJ(/):=2] 2] Mcrf. (41) 

,■=1 0-: |o-|=l 

In words, OBJ(/) is the total "Fourier mass" of all functions {/il^^j at level 1. Note that there are n{k - 1) 
multi-indices cr such that |o"| = 1. 

The objective value for dictatorships 

For £ € [n] we define a dictatorship function f^"^''^ : Q," — > Aj. as follows. The range of the function is 
limited to only k points in A^, namely the points {e\,e2,---,ek} where e,- is a vector with i'^ coordinate 1 and 
all other coordinates zero. 

f'''\aj):=eiifcjc = i. (42) 

In other words, when one writes f'^^'^'^^ - (/i,/2, . . . ,/^), ft is {0, l)-valued and fi{a)) = 1 iff w^ = /. It is 
easy to see that the Fourier expansion of fi is 

M^) = l Z X^,ii)XAco). (43) 

0-: o-j=0 Vy'itf 



Indeed, the right hand side of (1431) equals 






1 V' V /■\ V / ^ _ I 1 if ^f - ') 

Otherwise. 



0<crf<<:-l 

The Fourier mass of f. "^'' at level 1 equals 

y ( X^,{i) \^ _ /X()(/)\^ ^ y 1 X^,(1) ? 1 , k ^k-\ 

Summing the Fourier mass of all f. "^ ' 's at level 1, we get 

OBJ(/'^''0 = 1 - -. (44) 

k 
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The objective value for functions with low influences 

For f -.fl" ^R,j e [n] and m e N denote 



Inff'C/) - Y, /('^)' 



\a-\<m 
crj*0 

For every 77 > we will use the smoothing operator: 

O" 

The following theorem is the key analytic fact used in our UGC hardness result: 

Theorem 3.8. For every s > 0, there exists t > Q so that the following holds: for any function / : Q" ^ A^ 
such that 



we have, 



V/€M, VjeM, Inff ^(^y;) < r 



OBJ(/) < C(k) + s. 



Proof. Let 6,r] > be sufficiently small constants to be chosen later. Let Qi = Qf. be the multi-linear 
polynomial associated with f. Recall that Qi is a multi-linear polynomial in n{k- 1) indeterminates {xjc \ j € 
[n], £e[k- 1]). Moreover f - Qi{?() has range [0, 1] and X^i f < 1- 

Let Ri = {Ti-sQi){X) and 5, = {Ti-sQi){0) (the smoothening operator Tis helps us meet some techni- 
cal pre-conditions before applying the invariance principle on lfT6l ). Note that/?; has range [0, 1] and 5, has 
range R. It will follow however from [16] that Si is with high probability in [0, 1]. First we relate OBJ(/) to 
the functions S , which will, up to truncation, induce a partition of IR."^'^"'^ which in turn will give the bound 
in terms of C{k). 



;=1 o-;|cr|=l 
k n k-\ 



= (1-5) 



«. n ft.— i ^ 

;=l j=Y [=\ ^*- 

^ (1 - 5)^ V I X Qi{x)dyn(k-\){x) 

k „ 

- y [ 

k ^ 

= y [ 



X iTi-sQi){x)dyn{k-i)ix) 



X S i{x)dyn(k-\){x) 



We bound the last term by C{k) + o{l). For any real- valued function h on K."*^*^ ^\ let 

if/j(x)<0, 
chop(/i)(x) :- { h{x) if h{x) e [0, 1], 

1 ifh{x)>l. 



(45) 
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For every subset / c [k], let Qj := Yjiei Qi- Since every Qi has small low-degree influence, so does every 
Qi. Let Ri := ZieiiTi-6Qi){?(), and S, := I^ieiiTy-sQiXm- Note that /?(„ - 7?,- and 5j,-| = S,-. Applying 
Theorem 3.20 in lfT6l to the polynomial Qj, it follows that (provided r is sufficiently small compared to 6 
and T]), 



\\Si - chop(5/)|| = I \Si{x) - chop(5/)(x)| <iy„(^-i)(x) < rj. 



(46) 



The functions chop(5',) are almost what we want except that they might not sum up to at most 1. So 
further define 



f chop(5,)(x) if Y.U chop(5;)(x) < 1, 



SUx) ■- 



chop(5,)(ji-) 
V (Zi, chop(S,)(x)) 



ifZUchop{Si){x)>\. 



Clearly, S* have range [0, 1] and 'ZJi=i S* < I. Observe that the following holds point-wise: 

k ( '' ] 

< chop(50 -S* <Y^ (chop(5y) - S*) < max 0, _^ chop(5y) - 1 < J] |5/ - chop(5/)| , 

where the last inequality holds since for every x, by defining I = I{x) - {i\ S j{x) > 0), 

k 

2 chop(5y)(x) -1 = 2] chop(Sy)(x) - 1 < J] 5/x) - 1 < \Si(x) - chop(5/)(x)| . 
;=i jsi jei 

It follows that 

||chop(5,) - S*\\^ < Y, \\Si - chop(57)||2 < 2' V^, 
/CM 
where we used (l46l) . Finally, 

\\Si -S*\\^< \\Si - chop(50||2 + ||chop(50 - 5*112 < (2^ + 1) V^. 
Now write 



I X S i{x)dyn(k-i){x) = I xS*{x)dy„(k^i){x)+ I x{Si{x)-S* 

Jr"(*-i) J]R"(*-i) Jr-'C-d 



{x))dyna-\){x). 



(47) 



(48) 



The norm of second integral is bounded by (2*^ -F 1) -^ using (l47l ) and Lemma 1X9) below. Since ||5*||2 < 1, 
the norm of first integral is bounded by 1, and thus 



r 

Jr"<*- 



X S i{x)dyn(k-\){x) 



f X S;{x)dynik-i){x) +2(2'^+l)V^ + (2*^+l)V 

Jr"<*-1) 2 



Returning to the estimation in Equation (1451) and noting that Y^^^^S* < 1, 
V I X S i{x)dyn{k-\){x) 



!=1 



^ iff 



n(t-l) 

< sup 

/:R"<'--i'^At 



x5*(.x)c/7„(i-i)(.x) 



y r 

,1^ JR"<*-') 



X fi(x)dyn(k^i)(x) 



+ 1{2^ + 1)2 V^ 

2 



+ 2(2*^ + 1)^ V^ 



- C(A:) -F 2(2*^ + 1)^ V^. 
It follows that OBJ(/) < C(k)-^^{2>^^^\f ^iv ^ ^^^^ ^ ^ provided that rj and 6 are small enough. 



D 
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Lemma 3.9. Let g e L2{W\ yn). Then 

\ X g{x)dyn{x) 

Jr" 

Proof. Note that the square of the left hand side equals 



< \\g\\ 



2- 



n p 



Xi g{x)dynix) 



2 n 



Y^ix.gf 



j=i 



Since Xj e L2(W, yn) are an orthonormal set of functions, the sum of squares of projections of g onto them 
is at most the squared norm of g. u 

The intended hardness factor 

As we show next, the dictatorship test can be translated (in a more or less standard way by now) into a 
UGC-hardness result. The hardness factor (as usual) turns out to be the ratio of the objective value when the 
function is a dictatorship versus the function has all low influences, i.e. 



\-\lk _ \-llk 
C{k) + o(l) " C{k) 



-o{\). 



3.3 The reduction from unique games to kernel clustering 

Given a Unique Games Instance £,{G{V, W, E), {n\, {nyw : [n] -^ [«]|(v,w)e£), we construct an instance of the 
clustering problem. We first reformulate the kernel clustering problem for the ease of presentation. 



Reformulation of the problem 

Given an instance of the kernel clustering problem (A = {Ust), B = (bij)) where A and B are N x N and kxk 
PSD matrices respectively, we note that 



max Y\as, bo-(sio-(t) 



^ max Y\a,tY\bijF{s)iF{t)j 

'"■m^^k s,t ij 

= max ^bij^a,tFi{s)Fj{t) 



f:[iV]^Aj. 



•J 



max y]bijQA{Fi,Fj) 



(49) 
(50) 
(51) 



where on line (l49l) . instead of choosing a label o-{s) e [k], we allow a distribution over the k labels F{s) e A^. 
The equality follows since any such probabilistic labeling F yields a labeling cr with the same expected 
objective value by picking, for every s e [N], a label / with probability F(s)i. On line (l50l ) we interchanged 
the order of summation and interpreted the i'^ co-ordinate of F(s) (i.e. F{s)i) as the value of a function F,- : 
[N] -^ [0, 1] at index s (i.e. Fi{s)). Thus F = {Fi,F2, . - . , F^). On line (|5TI ) we rewrote Yjs,t '^stFi{s)Fj{t) as 
a PSD quadratic form QAiFi, Fj) on the tables of values of functions Fi and Fj. 

This enables us to reformulate the clustering problem as: Given a PSD matrix B, and a PSD quadratic 
form Q{-, •) on R^ X R^, find F : [N] ^ Ak, F = {FuF2,..., Fu), so as to maximize Y.ij bijQ{Fi, Fj). 
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The clustering problem instance 

Given a Unique Games instance 

^{G{V,W,E),\n\,{n,„ : [«] ^ Wl(v,w)ei?), 

the clustering problem is to find f : W x Q" ^ A/; so as to maximize X^^j Q(Fi,Fi) where Qis a suitably 
defined PSD quadratic form. Thus the matrix B is the kxk identity matrix. For notational convenience, we 
let 

F^:=Fiw,-), F„:Q."^A. 

Also, for every v e V, we let 

Fv '■= E(v,w)e£ [Fw ° ^vw\ , F„ : Q" ^ A. 

We used the following notation: for any function ^ : Q" ^ A,t and tt : [«] ^ [«], ^ o tt : Q" ^ A,t denotes a 
function 

As usual, we denote F^, = {Fii,j,F„^2, ■ ■ ■ ,F„^k) where each F„^i has range [0, 1] and Y!1=i FwJ ^ 1- Simi- 
larly, Fy = (Fy 1 , Fy 2, ■ ■ • , F y ^k) . Now wc arc ready to define the clustering problem instance. 

Clustering instance: The goal is to find F : W x Q" ^ A/; so as to maximize: 

k 
F:Wxa"^At "" " ' "^ F:WxD."^At 



max Eygy [OBJ(Fv)] = max > Evgy > Fvi{cr) 



i=\ 



o-:|o-|=l 



(52) 



Completeness. 

We will show that if the Unique Games instance has an almost satisfying labeling, then the objective value 
of the clustering problem is (1 - o(l)) • (1 - 1//:). So, let p : V U W ^ [«] be the labeling, such that for at 
least 1 - e fraction of the vertices v e V (call such v good) we have 

^vw(p(h')) = p(v) V (v, w) e E. 

Define F : W X Q" ^ A,t as follows: for every w e W, F^^ : Q" -^ ^u equals the dictatorship for p(h') € [«], 
i.e. 

p ._ rdict,p(w) 

Lemma 3.10. f'''"'J on = /^'^'."(i). 

Proof. /^'c«.'r(i)(^) equals ee if oj^u) = ^- On the other hand 

which equals ee since cj^u) - i- n 

Lemma 3.11. For a good veV, F„ = f'^'-P^^'l 
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Proof. For a good v, nvw{p{w)) = p(v) for every (v, w) € E. Thus 



Thus the contribution of v in ( [52l ) is OBJ(/^'''''''^^^) = 1 - 1/A: as observed in Equation (l44l ). Since 1 - e 
fraction of v € V are good, (l52l ) is at least (1 - e) • (1 - l//c). 

Soundness 

Suppose on the contrary that the value of ( |52l ) is at least C(k) + 2s. We will prove that the Unique Games 

2 

instance must have a labeling that satisfies at least ^^^^^'^ , . fraction of its edges, reaching a contradiction, 
provided its soundness is chosen to be lower to begin with. 

We define a labeling as follows. First we define a not-too-large set of labels L{w) c [n] for every w eW. 
Let r be as in Theorem 13. 8 1 



L{w) - {j e[n]\3ie [k], lnif^^"'\F,,j) > t/i] 



Clearly, \L{w)\ < — since each F^j has range [0, 1] and therefore the sum of all degree-log(l/T) 

influences is at most log(l/T). 

Now assume that the value of (|52] ) is at least C{k) + 2s. By an averaging argument, for at least s fraction 
of V e V (call such v nice), OBJ(f v) ^ C{k) + s. Applying Theorem 13.81 we conclude that there exists 
Jo e M,7o € [n\ such that InfT'°^^'^^\Fv,,o) > r. Observe that 

T < Infjf (i/^)(F,,,„) 

= InfT|°^^'^''^(E(v,H,)g£ [Fh.,;o o Tiv^,] ) 

< E(v,w)e£ [infT^'"^^'^""^ {F„^^ o n,,„)^ Using Lemma l3.12l below 

= ^{v,w)eE Inf"_°f . {FwSq) Using Lemma l3.14l below 

This impUes that for at least ^ fraction of w such that (v, >v) e E, we have ? < Inf"?^ {Fwin)- Thus 

^i~vI(7o) £ ^(w) by the definition of L{w). Define 70 to be the label of v. Finally, for every w € W, select a 
random label from L(w) (or an arbitrary label if L{w) - 0). Noting that s fraction of v € V are nice, and 
\L{w)\ < — °^ , it follows that the labeling satisfies e • 5 • 2kT-^L hit) ~ Ak\ol(\iT) fraction of the edges of 
the Unique Games instance. 

Lemma 3.12. Suppose C is a class of functions g : fl" ^ R and h :- Egc^clg]- Then for any j e [«] and 
integer d, 

Infjih) < E,ec [inf/g)] , Inffih) < E.^c [inff (g)] ■ 

Proof. We prove the first inequality, the second is similar by restricting summations to multi-indices \cr\ < d. 
lnfjih):= 2 I(cr)2= J] (e.^c [?(cr)]f < J] E.^c [?(cr)2] - E.^c [inf/g)] - 



O 
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Lemma 3.13. Suppose g : O" ^ R, tt : [«] ^ [n] and let cr be a multi-index. Then 

Proof. The proof is a straightforward computation which we omit. n 

Lemma 3.14. Suppose g : Q" ^ R, tt : [«] -^ \n\ and j e [n]. Then 

Infjig o n) = Inf,-.y)te), lnif{g o n) = \rd^„\..^{g). 

Proof. We prove the first equality, the second is similar by restricting summations to multi-indices |cr| < d. 

Inf/go;r):= J] f^niaf = Y, li^'\(r)f = ^ ^{af ^\nf,-,^j){g). 

J J »■ '0) 
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